AITopics | scene graph

fa64505ebdc94531087bc81251ce2376-Paper-Conference.pdf

Neural Information Processing SystemsMay-1-2026, 06:35:29 GMT

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
(2 more...)

Add feedback

fa64505ebdc94531087bc81251ce2376-Supplemental-Conference.pdf

Neural Information Processing SystemsMay-1-2026, 05:15:24 GMT

In this work, we investigate the task of text-to-image (T2I) synthesis under the abstract-to-intricate setting, i.e., generating intricate visual content from simple abstract text prompts. Inspired by human imagination intuition, we propose a novel scene-graph hallucination (SGH) mechanism for effective abstract-to-intricate T2I synthesis. SGH carries out scene hallucination by expanding the initial scene graph (SG) of the input prompt with more feasible specific scene structures, in which the structured semantic representation of SG ensures high controllability of the intrinsic scene imagination. To approach the T2I synthesis, we deliberately build an SG-based hallucination diffusion system. First, we implement the SGH module based on the discrete diffusion technique, which evolves the SG structure by iteratively adding new scene elements. Then, we utilize another continuous-state diffusion model as the T2I synthesizer, where the overt image-generating process is navigated by the underlying semantic scene structure induced from the SGH module. On the benchmark COCO dataset, our system outperforms the existing best-performing T2I model by a significant margin, especially improving on the abstract-to-intricate T2I generation. Further in-depth analyses reveal how our methods advance.2

Add feedback

Supplemental Material for 3DP3: 3DScene Perception via Probabilistic Programming

Neural Information Processing SystemsApr-25-2026, 20:51:57 GMT

Robust 6d object pose estimation by learning rgb-d features.

artificial intelligence, machine learning, probability, (17 more...)

Neural Information Processing Systems

Country: North America > United States > New York (0.28)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Add feedback

3DP3: 3DScene Perception via Probabilistic Programming

Neural Information Processing SystemsApr-25-2026, 20:51:57 GMT

We present 3DP3, a framework for inverse graphics that uses inference in a structured generative model of objects, scenes, and images.

artificial intelligence, inference, machine learning, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
(2 more...)

Add feedback

3f67fd97162d20e6fe27748b5b372509-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 14:22:45 GMT

artificial intelligence, machine learning, scene graph, (13 more...)

Neural Information Processing Systems

Country:

Asia (0.46)
North America > Canada (0.28)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Adaptive Visual Scene Understanding: Incremental Scene Graph Generation

Neural Information Processing SystemsMar-22-2026, 21:44:45 GMT

Scene graph generation (SGG) analyzes images to extract meaningful information about objects and their relationships. In the dynamic visual world, it is crucial for AI systems to continuously detect new objects and establish their relationships with existing ones. Recently, numerous studies have focused on continual learning within the domains of object detection and image recognition. However, a limited amount of research focuses on a more challenging continual learning problem in SGG. This increased difficulty arises from the intricate interactions and dynamic relationships among objects, and their associated contexts. Thus, in continual learning, SGG models are often required to expand, modify, retain, and reason scene graphs within the process of adaptive visual scene understanding.

artificial intelligence, machine learning, proceedings, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.96)
Information Technology > Artificial Intelligence > Vision (0.62)

Add feedback

Scene Graph Disentanglement and Composition for Generalizable Complex Image Generation

Neural Information Processing SystemsMar-22-2026, 03:03:06 GMT

There has been exciting progress in generating images from natural language or layout conditions. However, these methods struggle to faithfully reproduce complex scenes due to the insufficient modeling of multiple objects and their relationships. To address this issue, we leverage the scene graph, a powerful structured representation, for complex image generation. Different from the previous works that directly use scene graphs for generation, we employ the generative capabilities of variational autoencoders and diffusion models in a generalizable manner, compositing diverse disentangled visual clues from scene graphs. Specifically, we first propose a Semantics-Layout Variational AutoEncoder (SL-VAE) to jointly derive (layouts, semantics) from the input scene graph, which allows a more diverse and reasonable generation in a one-to-many mapping. We then develop a Compositional Masked Attention (CMA) integrated with a diffusion model, incorporating (layouts, semantics) with fine-grained attributes as generation guidance. To further achieve graph manipulation while keeping the visual content consistent, we introduce a Multi-Layered Sampler (MLS) for an isolated image editing effect. Extensive experiments demonstrate that our method outperforms recent competitors based on text, layout, or scene graph, in terms of generation rationality and controllability.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.86)

Add feedback

f670ef96387d9a5a8a51e2ed80cb148d-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 18:01:26 GMT

machine learning, natural language, object-oriented architecture, (15 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Asia > Singapore (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Communications (1.00)
(6 more...)

Add feedback

SpatialRGPT: Grounded Spatial Reasoning in Vision-Language Models

Neural Information Processing SystemsFeb-18-2026, 17:10:36 GMT

Vision Language Models (VLMs) have demonstrated remarkable performance in 2D vision and language tasks. However, their ability to reason about spatial arrangements remains limited. In this work, we introduce Spatial Region GPT (SpatialRGPT) to enhance VLMs' spatial perception and reasoning capabilities.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

South America > Brazil (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > France > Bourgogne-Franche-Comté > Doubs > Besançon (0.04)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.84)

Add feedback

Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image Y u Zhao

Neural Information Processing SystemsFeb-18-2026, 09:38:08 GMT

In the visual spatial understanding (VSU) area, spatial image-to-text (SI2T) and spatial text-to-image (ST2I) are two fundamental tasks that appear in dual form. Existing methods for standalone SI2T or ST2I perform imperfectly in spatial understanding, due to the difficulty of 3D-wise spatial feature modeling.

diffusion model, large language model, machine learning, (15 more...)

Neural Information Processing Systems

Country: